Sparse Substring Pattern Set Discovery Using Linear Programming Boosting

نویسندگان

  • Kazuaki Kashihara
  • Kohei Hatano
  • Hideo Bannai
  • Masayuki Takeda
چکیده

In this paper, we consider finding a small set of substring patterns which classifies the given documents well. We formulate the problem as 1 norm soft margin optimization problem where each dimension corresponds to a substring pattern. Then we solve this problem by using LPBoost and an optimal substring discovery algorithm. Since the problem is a linear program, the resulting solution is likely to be sparse, which is useful for feature selection. We evaluate the proposed method for real data such as movie reviews.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linear Programming Boosting by Column and Row Generation

We propose a new boosting algorithm based on a linear programming formulation. Our algorithm can take advantage of the sparsity of the solution of the underlying optimization problem. In preliminary experiments, our algorithm outperforms a state-of-the-art LP solver and LPBoost especially when the solution is given by a small set of relevant hypotheses and support vectors.

متن کامل

A Template Discovery Algorithm by Substring Amplification

In this paper, we consider to find a set of substrings common to given strings. We define this problem as the template discovery problem which is, given a set of strings generated by some fixed but unknown pattern, to find the constant parts of the pattern. A pattern is a string over constant and variable symbols. It generates strings by replacing variables into constant strings. We assume that...

متن کامل

A Column Generation Algorithm For Boosting

We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation simplex method. We prove that minimizing the soft margin error function (equivalent to solving an LP) directly optimizes a generalization error bound. LPBoost can be used to solve any boosting LP by iteratively optimizing the dual classification costs in a restricted...

متن کامل

Optimization for Sparse and Accurate Classifiers

OF THE DISSERTATION Optimization for sparse and accurate classifiers by Noam Goldberg Dissertation Director: Professor Jonathan Eckstein Classification and supervised learning problems in general aim to choose a function that best describes a relation between a set of observed attributes and their corresponding outputs. We focus on binary classification, where the output is a binary response va...

متن کامل

Tightened L 0 - Relaxation Penalties for Classification 1 by Noam Goldberg , 2 Jonathan Eckstein

In optimization-based classification model selection, for example when using linear programming formulations, a standard approach is to penalize the L1 norm of some linear functional in order to select sparse models. Instead, we propose a novel integer linear program for sparse classifier selection, generalizing the minimum disagreement hyperplane problem whose complexity has been investigated ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010